Long-branch attraction in species tree estimation: inconsistency of partitioned likelihood and topology-based summary methods

نویسندگان

  • Sebastien Roch
  • Michael Nute
  • Tandy Warnow
چکیده

With advances in sequencing technologies, there are now massive am-ounts of genomic data from across all life, leading to the possibility that arobust Tree of Life can be constructed. However, “gene tree heterogeneity”,which is when different genomic regions can evolve differently, is a commonphenomenon in multi-locus datasets, and reduces the accuracy of standardmethods for species tree estimation that do not take this heterogeneity intoaccount. New methods have been developed for species tree estimation thatspecifically address gene tree heterogeneity, and that have been proven toconverge to the true species tree when the number of loci and number ofsites per locus both increase (i.e., the methods are said to be “statisticallyconsistent”). Yet, little is known about the biologically realistic conditionwhere the number of sites per locus is bounded. We show that when the se-quence length of each locus is bounded (by any arbitrarily chosen value), themost common approaches to species tree estimation that take heterogeneity ∗Department of Mathematics, University of Wisconsin–Madison, 480 Lincoln Dr, Madison WI53706†Department of Statistics, The University of Illinois at Urbana-Champaign, 725 S Wright St#101, Champaign IL 61820‡Department of Computer Science, The University of Illinois at Urbana-Champaign, 201 NorthGoodwin Avenue, Urbana IL 61801-2302 1arXiv:1803.02800v1[q-bio.PE]7Mar2018 into account (i.e., traditional fully partitioned concatenated maximum like-lihood and newer approaches, called summary methods, that estimate thespecies tree by combining gene trees) are not statistically consistent, evenwhen the heterogeneity is extremely constrained. The main challenge is thepresence of conditions such as long branch attraction that create biased treeestimation when the number of sites is restricted. Hence, our study uncoversa fundamental challenge to species tree estimation using both traditional andnew methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Twisted trees and inconsistency of tree estimation when gaps are treated as missing data - The impact of model mis-specification in distance corrections.

Statistically consistent estimation of phylogenetic trees or gene trees is possible if pairwise sequence dissimilarities can be converted to a set of distances that are proportional to the true evolutionary distances. Susko et al. (2004) reported some strikingly broad results about the forms of inconsistency in tree estimation that can arise if corrected distances are not proportional to the tr...

متن کامل

Can quartet analyses combining maximum likelihood estimation and Hennigian logic overcome long branch attraction in phylogenomic sequence data?

Systematic biases such as long branch attraction can mislead commonly relied upon model-based (i.e. maximum likelihood and Bayesian) phylogenetic methods when, as is usually the case with empirical data, there is model misspecification. We present PhyQuart, a new method for evaluating the three possible binary trees for any quartet of taxa. PhyQuart was developed through a process of reciprocal...

متن کامل

The effect of branch lengths on phylogeny: an empirical study using highly conserved orthologs from mammalian genomes.

Phylogenetic analyses were applied to 269 families of putative orthologs represented by a single member in the genomes of human, mouse, dog, and chicken. Five methods were used: maximum parsimony (NP), neighbor-joining (NJ) with Poisson and Gamma distances; and maximum likelihood (ML) with JTT and JTT+gamma models. When applied to the concatenated sequence of all families, all methods strongly ...

متن کامل

A call for likelihood phylogenetics even when the process of sequence evolution is heterogeneous.

All methods of phylogenetic inference make assumptions about the underlying evolutionary process of their characters and it is these assumptions that determine their relative successes and failures in the estimation of the true phylogeny for a group (Hillis, 1995). This dependency of phylogenetic accuracy and robustness on evolutionary assumptions has been most extensively studied for the class...

متن کامل

Evaluation of estimation methods for parameters of the probability functions in tree diameter distribution modeling

One of the most commonly used statistical models for characterizing the variations of tree diameter at breast height is Weibull distribution. The usual approach for estimating parameters of a statistical model is the maximum likelihood estimation (likelihood method). Usually, this works based on iterative algorithms such as Newton-Raphson. However, the efficiency of the likelihood method is not...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018